Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vespa schema changes for query control & general quality of life #163

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

kdutia
Copy link
Member

@kdutia kdutia commented Dec 18, 2024

Description

A batch of changes to the schema making use of inheritance, input parameters and summaries. These aim to mean we can have more control over search without requiring schema changes in future. Recommended that the easiest way to go through this PR is by commit.

Change since reviews of the draft PR: you can now add a custom request body in SearchParameters, which gives us full control over the SearchAdaptor without having to push field changes through the schema and then pydantic models, and more immediately gives us control over the weightings of individual query components. I've demonstrated this in a test which shows that setting the closeness features to 0 is the same as using the hybrid_no_closeness rank profile.

Also:

  • set document languages to english
  • adds text fields with _bolding suffixes which give the bolded version of each when a search is done this didn't work, so have deleted it from this PR
  • adds a new hybrid profile with nativeRank - Vespa's alternative to BM25 that gives you a little bit more control and produces normalised scores

Sidenote: I tried to test this on the backend using the test pypi published package from this PR's CI but the Vespa dependency seemed to be broken on that. Not sure whether this is just me or it's actually broken 🤷

Proposed version

Please select the option below that is most relevant from the list below. This
will be used to generate the next tag version name during auto-tagging.

  • Skip auto-tagging
  • Patch
  • Minor version
  • Major version

Visit the Semver website to understand the
difference between MAJOR, MINOR, and PATCH versions.

Notes:

  • If none of these options are selected, auto-tagging will fail (integrated soon)
  • Where multiple options are selected, the most senior option ticked will be
    used -- e.g. Major > Minor > Patch
  • If you are selecting the version in the list above using the textbox, make
    sure your selected option is marked [x] with no spaces in between the
    brackets and the x

Type of change

Please select the option(s) below that are most relevant:

  • Bug fix
  • New feature
  • Breaking change

How Has This Been Tested?

Please describe the tests that you added to verify your changes.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

Copy link

linear bot commented Dec 18, 2024

@kdutia kdutia changed the title set of schema changes Vespa schema changes for query control & general quality of life Dec 18, 2024
@kdutia kdutia marked this pull request as ready for review December 18, 2024 12:17
@kdutia kdutia requested a review from a team as a code owner December 18, 2024 12:17
@kdutia kdutia marked this pull request as draft December 18, 2024 12:20
@kdutia kdutia marked this pull request as ready for review December 18, 2024 12:31
@kdutia kdutia marked this pull request as draft December 18, 2024 13:10
Copy link
Contributor

@olaughter olaughter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Left some comments below. I'm looking forward to seeing how we get on with nativerank!

@kdutia kdutia marked this pull request as ready for review December 19, 2024 11:03
@kdutia kdutia requested review from olaughter and jesse-c December 19, 2024 11:03
Copy link
Contributor

@olaughter olaughter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Small suggestion below but non-blocking 👍

src/cpr_sdk/models/search.py Show resolved Hide resolved
Comment on lines +108 to +114
overlapping_keys = set(vespa_request_body.keys()) & set(
parameters.custom_vespa_request_body.keys()
)
if overlapping_keys:
_LOGGER.warning(
f"Custom request body contains overlapping keys that will override defaults: {overlapping_keys}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlapping keys check feels like it could be lifted into a distinct utility method, that would also make it easy to unit test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important part of this to me is specifying that the request body contains overlapping keys, which I think we'd then loose if lifting out to a utility method? Wdyt?

src/cpr_sdk/vespa.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants